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Description 
TECHNICAL FIELD 

[0001] The invention relates to maintaining synchro- 
nized execution by processors in fault resilient/fault tol- 
erant computer systems. 

BACKGROUND 

[0002] Computer systems that are capable of surviv- 
ing hardware failures or other faults generally fall into 
three categories: fault resilient, fault tolerant, and disas- 
ter tolerant. 

[0003] Fault resilient computer systems can continue 
to function, often in a reduced capacity, in the presence 
of hardware failures. These systems operate in either 
an availability mode or an integrity mode, but not both. 
A system is "available" when a hardware failure does 
not cause unacceptable delays in user access, which 
means that a system operating in an availability mode 
is configured to remain online, if possible, when faced 
with a hardware error. A system has data integrity when 
a hardware failure causes no data toss or corruption, 
which means that a system operating in an integrity 
mode is configured to avoid data toss or corruption, even 
if the system must go offline to do so. 
[0004] Fault tolerant systems stress both availability 
and Integrity. A fault tolerant system remains available 
and retains data Integrity when faced with a single hard- 
ware failure, and, under some circumstances, when 
faced with multiple hardware failures. 
[0005] Disaster tolerant systems go beyond fault tol- 
erant systems. In general, disaster tolerant systems re- 
quire that loss of a computing site due to a natural or 
man-made disaster will not interrupt system availability 
or corrupt or lose data. 

[0006] All three cases require an alternative compo- 
nent that continues to function in the presence of the 
failure of a component. Thus, redundancy of compo- 
nents is a fundamental prerequisite for a disaster toler- 
ant, fault tolerant or fault resilient system that recovers 
from or masks failures. Redundancy can be provided 
through passive redundancy or active redundancy, each 
of which has different consequences. 
[0007] A passively redundant system, such as a 
checkpoint-restart system, provides access to alterna- 
tive components that are not associated with the current 
task and must be either activated or modified in some 
way to account for a failed component The consequent 
transition may cause a significant interruption of service. 
Subsequent system performance atso may be degrad- 
ed. Examples of passively redundant systems include 
stand-by servers and clustered systems. The mecha- 
nism for handling a failure in a passively redundant sys- 
tem is to "fail-over" , or switch control, to an alternative 
server. The current state of the failed application may 
be lost, and the application may need to be restarted in 



the other system. The fall-over and restart processes 
may cause some interruption or delay In service to the 
users. Despite any such delay, passively redundant sys- 
tems such as stand-by servers and clusters provide 
s "high availability" and do not deliver the continuous 
processing usually associated with "fault tolerance." 
[0008] An actively redundant system, such as a rep- 
lication system, provides an alternative processor that 
concurrently processes the same task and, in the pres- 
to ence of a failure, provides continuous service. The 
mechanism for handling failures is to compute through 
a failure on the remaining processor. Because at least 
two processors are looking at and manipulating the 
same data at the same time, the failure of any single 
is component should be Invisible both to the application 
and to the user. 

[0009] The goal of a fault tolerant system is to produce 
correct results In a repeatable fashion. Repeatability en- 
sures that operations may be resumed after a fault is 

20 detected. In a checkpoint-restart system, this entails 
rolling back to a previous checkpoint and replaying the 
inputs again from a journal file. In a replication system, 
repeatability results from simultaneous operation on 
multiple instances of a computer. 

25 [001 0] Many fault tolerant designs are known for sin- 
gle processor systems. There also are a few known fault 
tolerant, symmetric mult i -processing ("SMP") systems. 
The extra complexity associated with providing fault tol- 
erance In an SMP system causes problems for many 

30 traditional approaches to fault tolerance. 

[0011] For a checkpoint-restart system, the check- 
point information is somewhat more complex, but the 
recovery algorithm remains basically the same. Repeat- 
ability can be loosely interpreted to permit the replay of 

35 system operation to occur differently than the original 
system operation. In other words, the allocation of work- 
load between SMP processors on the replay does not 
have to follow the allocation that was being followed 
when the fault occurred. The order of the inputs must 

40 be preserved, but the relative timing of the inputs to each 
other and to the instruction .streams running on the dif- 
ferent processors does not need to be preserved. 
[0012] Under this loose repeatability standard, a re- 
play is valid as long as the results produced by the replay 

4S are proper for the sequence of inputs. An example is an 
airline reservation system with multiple customers (e.g., 
Mr. Smith and Ms. Jones) competing for the last seat. 
Due to input timing and processor scheduling, Ms. 
Jones gets the seat. However, before the result is post- 

50 ed, a fault occurs. On the replay, Mr. Smith gets the seat. 
Though producing a different result, the replay is valid 
since there is no cognizable problem associated with the 
change in result (i.e., Ms. Jones will never know she al- 
most got the seat). 

55 [001 3] SMP adds considerable complexity to replica- 
tion systems. Corresponding processors in correspond- 
ing systems must produce the same results at the same 
time. The input timing must be precisely preserved with 
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respect to the multiple Instruction streams. No differ- 
ence between processor arbitration cycles is allowed, 
because such a difference can affect who gets what re- 
source first. Making an SMP system with replication re- 
quires control of all aspects of the system that can affect 
the timing of input data and the arbitration between proc- 
essors. 

[001 4] For these reasons, fau It tolerant SMP systems 
generally are produced using the checkpoint-restart ap- 
proach. In such systems, the application and operating 
system software must be specially designed to support 
checkpoints. 

[001 5] The document EP-A-0 286 856 teaches a fault 
toterant symmetric multiprocessing system with strong- 
ly coupled compute elements. 

SUMMARY 

[001 6] The invention, various aspects of which are de- 
scribed here below, is defined in detail in the appended 
claims 1 and 24. 

[0017] In one general aspect, a fault tolerant/fault re- 
silient computer system includes at least two compute 
elements connected to at least one controller. Each of 
the compute elements has clocks that operate asyn- 
chronously to clocks of the other compute elements. 
The compute elements operate in a first mode In which 
the compute elements each execute a first stream of in- 
structions in emulated clock lockstep. Clock lockstep 
operation requires the compute elements to perform the 
same sequence of Instructions in the same order, with 
each instruction being performed in the same clock cy- 
cle by each compute element. The compute elements 
also operate in a second mode in which the compute 
elements each execute a second stream of instructions 
in instruction lockstep. Instruction lockstep operation re- 
quires the compute elements to perform the same se- 
quence of instructions in the same order, but does not 
require the compute elements to perform the Instruc- 
tions in the same clock cycle. 

[001 8] Implementations of the computer system may 
include one or more of the following features. For exam- 
ple, each compute element may be a multi-processor 
compute element, such as a symmetric multi-processor 
(SMP) compute element. Each compute element may 
be implemented using an industry standard mother- 
board. The system may be configured to deactivate all 
but one of the processors of each compute element 
when the compute elements are operating in the second 
mode. 

[001 9] The first stream of instructions may implement 
operating system and application software, while the 
second stream of instructions implements lockstep con- 
trol software. The operating system and application soft- 
ware may be unmodified software configured for use 
with computer systems that are not fault tolerant. 
[0020] Each compute element may include one or 
more processors, memory, and a connection to the con- 



troller. The compute elements maybe configured so that 
refresh operations associated with the memory are syn- 
chronized with execution of operations by the processor. 
The system also may be configured to initiate DMA 
5 transfers to the memory when the compute elements 
are operating in the second mode and to execute the 
initiated DMA transfers when the compute elements are 
operating in the first mode. 

[0021 ] The system may synchronize the compute el- 

10 ements by copying contents of the memory of a first 
compute element to the memory of a second compute 
element, and resetting the processors of the first and 
second compute elements in a way that does not affect 
the memories of the compute elements. 

15 [0022] The compute elements may transition from the 
first mode of operation to the second mode of operation 
in response to an Interrupt. For example, the Interrupt 
may be a performance counter interrupt generated by 
the compute element after the occurrence of a fixed 

20 number of clock cycles, such as processor clock cycles 
or bus clock cycles. The interrupt also may be generated 
after the execution of a fixed number of instructions. 
When the compute elements are multi-processor com- 
pute elements having primary processors and one or 

25 more secondary processors, the primary processor may 
be configured to halt operation of the secondary proc- 
essors In response to the interrupt. 
[0023] Each compute element may generate an inter- 
rupt during the transition from the second mode of op- 

30 eratlon to the first mode of operation. This interrupt 
serves to align the processing by the compute element 
with a clocking structure of the compute element. Typi- 
cally, the interrupt is synchronized with a clock having 
the lowest frequencies of the clocking structure. 

35 [0024] The system may redirect I/O operations by the 
compute elements to the controller. The system also 
may include a second controller connected to the first 
controller and to the two compute elements. The first 
controller and a first compute element may be located 

40 in a first location and the second controller and a second 
compute element may be Jocated In a second location, 
in which case the system also may Include a communi- 
cations link connecting the first controller to the second 
controller, the first controller to the second compute eh 

45 ement, and the second controller to the first compute 
element. The first location may be spaced from the sec- 
ond location by more than 5 meters, by more than 1 00 
meters, or even by a kilometer or more. 
[0025] A benefit of creating a fault resilient/ fault tol- 

so erant SMP system using replication is that the system 
can run standard application and operating system soft- 
ware, such as the Windows NT operating system avail- 
able from Microsoft Corporation. In addition, the system 
can do so using industry-standard processors and 

55 motherboards, such as motherboards based on Pen- 
tium series processors available from Intel Corporation. 
[0026] Other features and advantages will be appar- 
ent from the following description, including the draw- 
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ings, and from the claims. 
DESCRIPTION OF DRAWINGS 
[0027] 

Figs. 1 and 2 are block diagrams of a fault resilient/ 
fault tolerant uni-processor computer system. 
Fig. 3 is a block diagram of a fault resilient/fault tol- 
erant multi-processor computer system. 
Fig. 4 is a block diagram of a motherboard. 
Fig. 5 is a flow chart of a procedure implemented 
by the system of Fig. 3. 
Fig. 6 Is a block diagram of a PCI interface. 
Fig. 7 Is a flow chart of a procedure implemented 
by the system of Fig. 3. 

Fig. 8 is a block diagram of a system having two 
multi-processor compute elements and one I/O 
processor. 

Figs. 9A and 9B are a flow chart of a procedure im- 
plemented by the system of Fig. 8. 

DETAILED DESCRIPTION 

[0028] The fault tolerant systems described below 
emulate fully-phase-locked operation of multiple in- 
stances of a compute element. This should be contrast- 
ed to prior systems that operated multiple instances of 
a compute element In instruction lockstep, such as the 
Endurance 4000 system available from Marathon Tech- 
nologies Corporation of Boxboro, Massachusetts. In- 
struction lockstep operation occurs when multiple in- 
stances of a compute element perform the same se- 
quence of instructions in the same order. Fully-phase- 
locked operation, which also may be referred to as clock 
lockstep operation, occurs when multiple instances of a 
compute element perform the same sequence of in- 
structions in the same order, with each instruction being 
performed in the same clock cycle by each instance of 
the compute element. 

[0029] In the Endurance 4000 system, the Instances 
of a compute element operate in instruction stream lock- 
step. Each compute element executes the same se- 
quence of Instructions prior to producing an output The 
time needed to execute the instruction stream varies 
due to the uncontrolled past history of each compute el- 
ement. For example, caches, table lookahead buffers, 
branch prediction logic, speculative execution logic, and 
execution pipelines of the compute elements can have 
different initial values, which, even though the instruc- 
tion streams being executed are the same, result in var- 
ying execution times. 

[0030] Instruction lockstep operation may result in 
failures when the compute elements are SMP servers. 
In such a system, each compute element has multiple 
processors, each with Its own instruction stream. The 
instruction streams are arbitrating for shared resources. 
This arbitration must be resolved identically in both com- 



pute elements for redundant operation. Instruction lock- 
step operation does not provide a tight enough control 
over the processors and the memory to guarantee the 
same arbitration resolution in both compute elements. 

s [0031] Clock lockstep operation may be achieved by 
using a common oscillator to provide clocks to all in- 
stances of the compute element. However, such an im- 
plementation may be unsuited for fault tolerant opera- 
tion because it includes a single component, the com- 

'0 mon oscillator, the failure of which will cause failure of 
the entire system. 

[0032] Emulated clock lockstep operation avoids the 
single point of failure and Is achieved using the tech- 
niques described below. Emulated clock lockstep oper- 
'5 atlon offers the considerable additional benefit of per- 
mitting the different Instances of a compute element to 
be separated by distances of up to a kilometer or more. 
[0033] An emulated-clock-lockstep, non-SMP, fault 
tolerant system is described below. This description is 
20 followed by description of a fault tolerant SMP system 
using replication and emulated-clock-lockstep opera- 
tion. In both systems, the basic approach is to design a 
system in which multiple instances of a compute ele- 
ment are Initialized into exactly the same state and then 
25 provided with exactly the same input stimuli from a syn- 
chronous I/O subsystem. This causes each instance to 
produce exactly the same result. 
[0034] To progress a fault tolerant non-SMP (uni- 
processor) implementation to a fault resilient/fault toler- 
30 ant SM P implementation , each processor is replaced by 
several processors and an arbitration unit. Any time that 
a processor needs access to anything beyond its inter- 
nal cache (e.g., memory or I/O), the processor uses the 
arbitration unit to arbitrate for the external bus that con- 
as nects the processors together. Given that the arbitration 
units are finite state engines initialized to the same state, 
they will follow the same sequence of arbitrations as 
long as the processors are functioning correctly. 

40 Uni-Processor (Non-SMP) System 

[0035] Fig. 1 illustrates a fault tolerant, non-SM P sys- 
tem 100 that emulates clock lockstep operation. In gen- 
eral, all computer systems perform two basic opera- 

45 tions: (1) manipulating and transforming data, and (2) 
moving the data to and from mass storage, networks, 
and other I/O devices. The system 100 divides these 
functions, both logically and physically, between two 
separate processors. For this purpose, each half of the 

so system 1 00, called a tuple, includes a compute element 
(-CE M ) 105 and an I/O processor ("lOP") 110. The com- 
pute element 1 05 processes user application and oper- 
ating system software. I/O requests generated by the 
compute element 105 are redirected to the I/O proces- 

55 sor 110. This redirection is implemented at the device 
driver level. The I/O processor 1 1 0 provides I/O resourc- 
es, including I/O processing, data storage, and network 
connectivity. The I/O processor 110 also controls syn- 
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chronization of the compute elements. 
[0036] The system 1 00 Is fault tolerant In that It con- 
tinues to operate transparently to Its users In the pres- 
ence of any single hardware failure. The system 1 00 
emulates a traditional computing environment by parti- 
tioning it into two components. The compute element 
1 05 handles all compute tasks for the operating system 
and any applications. The I/O processor 110 handles al! 
I/O devices. Thus, the I/O processor handles ail of the 
asynchronous activities associated with a computer, 
while the compute element handles all of the synchro- 
nous compute activities. 

[0037] To provide the necessary redundancy for fault 
tolerance, the system 1 00 Includes at least two compute 
elements 1 05 and at least two I/O processors 110. The 
two compute elements 1 05 operate In lockstep while the 
two I/O processors 110 are loosely coupled. The I/O 
processors 1 1 0 feed both compute elements 1 05 the ex- 
act same data at a controlled place in the instruction 
streams of the compute elements. The I/O processors 
verify that the compute elements generate the same I/ 
O operations and produce the same output data at the 
same time. The I/O processors also cross check each 
other for proper completion of requested I/O activity. 
[0038] The system 100 uses a software-based ap- 
proach in a configuration based on inexpensive, indus- 
try standard processors. For example, the compute el- 
ements 1 05 and I/O processors 1 1 0 may be implement- 
ed using Pentium Pro processors available from Intel 
Corporation. The system may run unmodified, Industry- 
standard operating system software, such as the Win- 
dows NT operating system available from Microsoft Cor- 
poration, as well as industry-standard applications soft- 
ware. This permits a fault tolerant system to be config- 
ured by combining off-the-shelf, Intel Pentium Pro- 
based servers from a variety of manufacturers, which 
results in a fault tolerant or disaster tolerant system with 
low acquisition and life cycle costs. 
[0039] Each compute element 1 05 i ncludes a proces- 
sor 115, memory 120, and an interface card 125 (also 
referred to as a Marathon interface card, or MIC). The 
Interface card 125 includes drivers for communicating 
with two I/O processors simultaneously, as well as com- 
parison and test logic that assures results received from 
the two I/O processors are identical. In the fault tolerant 
system 100, the interface card 125 of each compute el- 
ement 105 is connected by high speed links 130, such 
as fiber optic links, to interface cards 125 of the two I/O 
processors 110. The interface cards 125 may be imple- 
mented as PCI-based adapters. 
[0040] Each I/O processor 110 includes a processor 
1 1 5, memory 1 20, an interface card 1 25, and I/O adapt- 
ers 135 for connection to I/O devices such as a hard 
drive 140 and a network 145. As noted above, the inter- 
face card 125 of each I/O processor 110 Is connected 
by high speed links 1 30 to the interface cards 1 25 of the 
two compute elements 105. In addition, a high speed 
link 150, such as a private ethemet link, is provided be- 



tween the two I/O processors 110. 
[0041] All I/O task requests from the compute ele- 
ments 105 are redirected to the I/O processors 110 for 
handling. The I/O processor 110 runs specialized soft- 

5 ware that handles all of the fault handling, disk mirroring, 
system management, and resynch ionization tasks re- 
quired by the system 1 00. By using a multitasking oper- 
ating system, such as Windows NT, the I/O processor 
110 may run other, non-fault tolerant applications. In 

10 general, a compute element may run Windows NT Serv- 
er as an operating system while, depending on the way 
that the I/O processor is to be used, an I/O processor 
may run either Windows NT Server or Windows NT 
Workstation as an operating system. 

15 [0042] The two compute elements 105 run lockstep 
control software, also referred to as quantum synchro- 
nization software, and execute the operating system 
and the applications in emulated clock lockstep. Disk 
mirroring takes place by duplicating writes on the disks 

20 140 associated with each I/O processor. If one of the 
compute elements 105 should fail, the other compute 
element 105 keeps the system running with a pause of 
only a few milliseconds to remove the failed compute 
element 105 from the configuration. The failed compute 

25 element 1 05 then can be physically removed, repaired, 
reconnected, and turned on. The repaired compute el- 
ement then Is brought back automatically into the con- 
figuration by transferring the state of the running com- 
pute element to the repaired compute element over the 

30 high speed links and resynchronizlng. The states of the 
operating system and applications are maintained 
through the few seconds it takes to resynchronize the 
two compute elements so as to minimize any impact on 
system users. 

35 [0043] If an I/O processor 110 fails, the other I/O proc- 
essor 110 continues to keep the system running. The 
failed I/O processor then can be physically removed, re- 
paired and turned back on. Since the I/O processors are 
not running in lockstep, the repaired system may go 

40 through a full operating 3ystem reboot, and then may be 
resynchronized. After beir>g resynchronized, the re- 
paired I/O processor automatically rejoins the configu- 
ration and the mirrored disks are re-mirrored in back- 
ground mode over the private connection 150 between 

45 the I/O processors. A failure of one of the mirrored disks 
is handled through the same process. 
[0044] The connections to the network 1 45 also are 
fully redundant. Network connections from each I/O 
processor 1 1 0 are booted with the same address. Only 

50 one network connection is allowed to transmit messag- 
es, while both are allowed to receive messages. In this 
way, each network connection monitors the other 
through the private ethernet. Should either network con- 
nection fail, the I/O processors will detect the failure and 

55 the remaining connection will carry the load. The I/O 
processors notify the system manager in the event of a 
failure so that a repair can be initiated. 
[0045] While Fig. 1 shows both connections on a sin- 
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gle network segment, this is not a requirement. Each I/ 
O processor's network connection may be on a different 
segment of the same network. The system also accom- 
modates multiple networks, each with its own redundant 
connections. The extension of the system to disaster tol- 
erance requires only that the connection between the 
tuples be optical fiber or a connection having compatible 
speed. With such connections, the tuples may be 
spaced by distances of a kilometer or more. Since the 
compute elements are synchronized over this distance, 
the failure of a component or a site will be transparent 
to the users. 

[0046] Fig. 2 provides a summarized view of the sys- 
tem 100 of Fig. 1 . The system Includes redundant com- 
pute elements 105 ("CEs") and I/O processors 110 
("lOPs"). Each CE 105 Is responsible for all computing 
and may be implemented using an industry standard 
motherboard. Each IOP 110 is responsible for access 
to I/O devices, and for system control. The lOPs run 
asynchronously of each other and verify that the CEs 
are performing the same operations in the same order. 
The lOPs also track each other's I/O completion to en- 
sure that no I/O is lost. 

[0047] The CEs generate the same outputs in the ex- 
act same sequence, and run in emulated clock lockstep, 
even though the CE clocks are asynchronous to each 
other. The CEs are Initialized to the same state and are 
fed consistent inputs at exactly the same time. The CEs 
are periodically realigned using a self-generated inter- 
rupt that Is related to the occurrence of a quantum of 
clock cycles (e.g., 100,000 clock cycles) and is referred 
to as a quantum interrupt ("Ql"). By contrast, the prior 
Endurance 4000 system used Qls related to the com- 
pletion of a quantum of instructions. All inputs to the CEs 
are delivered at either an output window or after the 
completion of an instruction quantum. Both of these 
points are guaranteed to occur at the same point in the 
instruction streams of the CEs. The approach employed 
by the Endurance 4000 system Is described in U.S. Pat- 
ent Nos. 5,600,784 and 5,615,403. 

Multi-Processor (SMP) System 

[0048] Fig. 3 illustrates a fault resilient/fault tolerant, 
symmetric multi-processing ("SMP") system 300. Each 
CE 305 of the system 300 includes a collection of proc- 
essors 310 connected by a common processor bus 315 
and an arbitration unit 320. The processors use the bus 
315 and arbitration unit 320 to access a shared memory 
325, and to access two lOPs 330 through an interface 
card 335 and high speed data links 340. 
[0049] The lOPs 330 operate identically to the lOPs 
110 of the system 100. Thus, the lOPs handle all I/O 
task requests from the processors 310 and run special- 
ized software that handles all of the fault handling, disk 
mirroring, system management, and ^synchronization 
tasks required by the system 300. 
[0050] One processor 310 (identified as processor 



31 0a) of each CE 305 serves as a primary processor 
and runs lockstep control software in addition to execut- 
ing an operating system and applications in emulated 
clock lockstep with the other CE. The remaining proc- 
5 essors in each CE 305 execute the operating system 
and applications in emulated clock lockstep with the oth- 
er CE. 

[0051] Referring to Fig. 4, a motherboard 400 for use 
in a CE 305 of the system 300 includes two or more proc- 

10 essors 310. Each processor may operate at a clock 
speed of, for example, 300 MHz or 350 Mhz. The proc- 
essors 310 are interconnected and connected to the ar- 
bitration unit 320 by the bus 315, which is also referred 
to as the processor bus or the front side bus ("FSB"). 

'5 The FSB typically operates at a clock speed of 100 MHz. 
The arbitration unit 320 Is commonly referred to as the 
North Bridge, since it serves as a bridge from the proc- 
essor bus 315 to the memory 325 and to the PCI bus 
705. The PCI bus 705 typically is a 32 bit bus operating 

20 at 33 MHz or a 64 bit bus operating at 66 Mhz. The in- 
terface card 335 is implemented as a PCI device con- 
nected to the PCI bus 705. 

[0052] The PCI bus 705 is also connected to another 
component, which is commonly referred to as the South 

25 Bridge 710. The South Bridge includes an advanced pe- 
ripheral interrupt controller ("APIC") 71 5 that provides 
Interrupts to the processors 310 on an APIC bus 720. 
The processors 310 include their own APICs 725 that 
receive the interrupts. The APIC bus may be, for exam- 

30 pie, a 16.6 MHz bus. 

[0053] The motherboard 700 may be implemented 
using an industry standard motherboard. In this case, 
the motherboard 700 also may include a number of com- 
ponents that, though standard on the motherboard, are 

35 not used by the system 300. These components include 
a video card 730 connected to the North Bridge 320 by 
an AGP bus 735 (or by the PCI bus); one or more SCSI 
controllers 740 connected to the PCI bus 705; one or 
more PCI devices 745 connected to the PCI bus 705; 

40 an IDE drive controller 750 connected to the South 
Bridge 710; an ISA (16 bit, 8 Mhz) or EISA (32 bit, 8 
Mhz) bus 755 connected to the South Bridge 710; one 
or more ISA or EISA devices 760 connected to the bus 
755; and a super I/O controller 765 connected to the bus 

45 755 to provide keyboard, mouse, and floppy drive sup- 
port, as well as parallel and serial ports. These compo- 
nents, if present, are not used by the CE 305. 
[0054] Marathon's prior Endurance 4000 system pro- 
vided a fault tolerant structure in which processors were 

so kept in lockstep while disregarding time skew. In es- 
sence, the time difference between processors was not 
important, assuming asynchrony between processors 
did not affect instruction lockstep. Memory refresh and 
DMA interactions, which had no impact on the lockstep 

55 of the processors, did affect the timing asynchrony. Vid- 
eo processing had both a timing and an instruction com- 
ponent. Care was taken to ensure that video and quan- 
tum processing created neither instruction nor data di- 
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